Search CORE

11 research outputs found

Named entity extraction for speech

Author: Horlock James
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Named entity extraction is a field that has generated much interest over recent years with the explosion of the World Wide Web and the necessity for accurate information retrieval. Named entity extraction, the task of finding specific entities within documents, has proven of great benefit for numerous information extraction and information retrieval tasks.As well as multiple language evaluations, named entity extraction has been investigated on a variety of media forms with varying success. In general, these media forms have all been based upon standard text and assumed that any variation from standard text constitutes noise.We investigate how it is possible to find named entities in speech data.. Where others have focussed on applying named entity extraction techniques to transcriptions of speech, we investigate a method for finding the named entities direct from the word lattices associated with the speech signal. The results show that it is possible to improve named entity recognition at the expense of word error rate (WER) in contrast to the general view that F -score is directly proportional to WER.We use a. Hidden Markov Model {HMM) style approach to the task of named entity extraction and show how it is possible to utilise a HMM to find named entities within speech lattices. We further investigate how it is possible to improve results by considering an alternative derivation of the joint probability of words and entities than is traditionally used. This new derivation is particularly appropriate to speech lattices as no presumptions are made about the sequence of words.The HMM style approach that we use requires using a number of language models in parallel. We have developed a system for discriminately retraining these language models based upon the results of the output, and we show how it is possible to improve named entity recognition by iterations over both training data and development data. We also consider how part-of-speech (POS) can be used within word lattices. We devise a method of labelling a word lattice with POS tags and adapt the model to make use of these POS tags when producing the best path through the lattice. The resulting path provides the most likely sequence of words, entities and POS tags and we show how this new path is better than the previous path which ignored the POS tags

Edinburgh Research Archive

Domain-specific Web site identification: the CROSSMARC focused Web crawler

Author: Curran James
Dingare Shipra
Grover Claire
Horlock James
Karkaletsis Vangelis
Paliouras Georgios
Stamatakis Konstantinos
Publication venue
Publication date: 01/01/2003
Field of study

Edinburgh Research Explorer

Efficient clinical-grade γ-retroviral vector purification by high-speed centrifugation for CAR T cell manufacturing

Author: Abreu Sara
Ade-Onojobi Michael
Banani Mohammad Amin
Chen Jie
Day William
Domining Sabine
Farzaneh Farzin
Horlock Claire
Hussain Rehan
Khinder Ravin
Macmorland William
Madigan Meghan
Matsumoto Sofia
Mekkaoui Leila
Miah Shahed
Nikoniuk Aleksandra
Price Juliet
Pule Martin
Rubat Lydie
Sabatino Marianna
Sillibourne James
Slepushkin Vladimir
Smith Koval
Srivastava Saket
Stevenson Elena
Tejerizo Jose G
Walker Simon
Williams Sarah
Publication venue: 'Elsevier BV'
Publication date: 09/12/2022
Field of study

γ-Retroviral vectors (γ-RV) are powerful tools for gene therapy applications. Current clinical vectors are produced from stable producer cell lines which require minimal further downstream processing, while purification schemes for γ-RV produced by transient transfection have not been thoroughly investigated. We aimed to develop a method to purify transiently produced γ-RV for early clinical studies. Here, we report a simple one-step purification method by high-speed centrifugation for γ-RV produced by transient transfection for clinical application. High-speed centrifugation enabled the concentration of viral titers in the range of 107-108 TU/mL with >80% overall recovery. Analysis of research-grade concentrated vector revealed sufficient reduction in product- and process-related impurities. Furthermore, product characterization of clinical-grade γ-RV by BioReliance demonstrated two-logs lower impurities per transducing unit compared with regulatory authority-approved stable producer cell line vector for clinical application. In terms of CAR T cell manufacturing, clinical-grade γ-RV produced by transient transfection and purified by high-speed centrifugation was similar to γ-RV produced from a clinical-grade stable producer cell line. This method will be of value for studies using γ-RV to bridge vector supply between early- and late-stage clinical trials

UCL Discovery

PubMed Central

Named Entity Extraction from Word Lattices

Author: Horlock James
King Simon
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2003
Field of study

We present a method for named entity extraction from word lattices produced by a speech recogniser. Previous work by others on named entity extraction from speech has used either a manual transcript or 1-best recogniser output. We describe how a single Viterbi search can recover both the named entity sequence and the corresponding word sequence from a word lattice, and further that it is possible to trade off an increase in word error rate for improved named entity extraction

Edinburgh Research Archive

Discriminative Methods for Improving Named Entity Extraction on Speech Data

Author: Horlock James
King Simon
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2003
Field of study

In this paper we present a method of discriminatively training language models for spoken language understanding; we show improvements in named entity F-scores on speech data using these improved language models. A comparison between theoretical probabilities associated with manual markup and the actual probabilities of output markup is used to identify probabilities requiring adjustment. We present results which support our hypothesis that improvements in F-scores are possible by using either previously used training data or held out development data to improve discrimination amongst a set of N-gram language models

Edinburgh Research Archive